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Here, we present the draft genome of the endosymbiont "Candidatus Ruthia magnifica" UCD-CM, a member of the phylum Pro- 
teobacteria, found from the gills of a deep-sea giant clam, Calyptogena magnifica. The assembly consists of 1,160,249 bp con- 
tained in 18 contigs. 
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The gammaproteobacterial endosymbiont "Candidatus Ruthia 
magnifica" was previously found to be an obligate, intracellu- 
lar autotroph in one species of giant clam, Calyptogena magnifica 
(1-3). "Candidatus Ruthia magnifica" possesses the ability to fix 
carbon for its host, although the specific biochemical mechanisms 
of this ability remain elusive (4, 5). 

Calyptogena magnifica was collected from a 28 May 2002 deep- 
sea exploration of a hydrothermal vent located in the Galapagos 
Rift, via the submersible, DSV Alvin, dive 3790 (6). Gill tissue was 
dissected and frozen in liquid nitrogen. Genomic DNA was ex- 
tracted as previously described for environmental samples (7). 
Illumina paired-end libraries were made using a modified version 
of the Nextera kit by Illumina but with homegrown transposase. 

A total of 3,587,578 paired-end reads were generated on an 
Illumina MiSeq, at a read length of 160 bp. Quality trimming and 
error correction of the reads resulted in 3,500,962 high-quality 
reads. All sequence processing and assembly was performed using 
the A5 assembly pipeline (8). This pipeline automates the pro- 
cesses of error correction, data cleaning, scaffolding, contig as- 
sembly, and quality control. The resulting assembly produced 
17,632 contigs, with an N 50 of 591. Screening the contigs using 
NCBI BLASTx against the NCBI's nonredundant GenBank data- 
base showed a preponderance of non "Candidatus Ruthia magni- 
fica" (human, Escherichia coli, or other) hits. A consequent BLAST 
filter against a "Candidatus Ruthia magnifica" reference database, 
however, identified 18 of these contigs as "Candidatus Ruthia 
magnifica" and increased the genome N 50 to 105,440. The result- 
ing genome consisted of 1,160,249 bp, with a GC content of 34% 
and an overall coverage estimate of 19 X . Scaffolds were verified by 
mapping error-corrected reads to the assembly using the 
Burrows-Wheeler Aligner (BWA) (9). Completeness of the ge- 
nome was assessed using PhyloSift software (10), which searches 
for a list of 37 highly conserved, single-copy marker genes (11), of 
which all 37 were found in this assembly. 

Automated annotation was performed using the RAST server 
(12). "Candidatus" sp. strain UCD-CM contains 1,215 predicted 
protein-coding genes and 40 predicted noncoding RNAs. 



Nucleotide sequence accession numbers. This whole-genome 
shotgun project has been deposited in DDBJ/EMBL/GenBank un- 
der the accession number JARW00000000. The version described 
in this paper is the first version, IARW01000000. 
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